Fast Data Anonymization with Low Information Loss
نویسندگان
چکیده
Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy preserving paradigms of k-anonymity and `-diversity. k-anonymity protects against the identification of an individual’s record. `-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) The information loss metrics are counter-intuitive and fail to capture data inaccuracies inflicted for the sake of privacy. (ii) `-diversity is solved by techniques developed for the simpler k-anonymity problem, which introduces unnecessary inaccuracies. (iii) The anonymization process is inefficient in terms of computation and I/O cost. In this paper we propose a framework for efficient privacy preservation that addresses these deficiencies. First, we focus on one-dimensional (i.e., single attribute) quasiidentifiers, and study the properties of optimal solutions for k-anonymity and `-diversity, based on meaningful information loss metrics. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multi-dimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the state-of-the-art, in terms of execution time and information loss.
منابع مشابه
Utility-preserving anonymization for health data publishing
BACKGROUND Publishing raw electronic health records (EHRs) may be considered as a breach of the privacy of individuals because they usually contain sensitive information. A common practice for the privacy-preserving data publishing is to anonymize the data before publishing, and thus satisfy privacy models such as k-anonymity. Among various anonymization techniques, generalization is the most c...
متن کاملTrading Privacy for Information Loss in the Blink of an Eye
The publishing of data with privacy guarantees is a task typically performed by a data curator who is expected to provide guarantees for the data he publishes in quantitative fashion, via a privacy criterion (e.g., k-anonymity, l-diversity). The anonymization of data is typically performed off-line. In this paper, we provide algorithmic tools that facilitate the negotiation for the anonymizatio...
متن کاملk-anonymity based framework for privacy preserving data collection in wireless sensor networks
In this paper, k-anonymity notion is adopted to be used in wireless sensor networks (WSN) as a security framework with two levels of privacy. A base level of privacy is provided for the data shared with semitrusted sink and a deeper level of privacy is provided against eavesdroppers. In the proposed method, some portions of data are encrypted and the rest is generalized. Generalization shortens...
متن کاملPrivacy Preserving Data Publishing Based on k-Anonymity by Categorization of Sensitive Values
In many organizations large amount of personal data are collected and analyzed by the data miner for the research purpose. However, the data collected may contain sensitive information which should be kept confidential. The study of Privacypreserving data publishing (PPDP) is focus on removing privacy threats while, at the same time, preserving useful information in the released data for data m...
متن کاملNovel Approaches for Privacy Preserving Data Mining in k-Anonymity Model
In privacy preserving data mining, anonymization based approaches have been used to preserve the privacy of an individual. Existing literature addresses various anonymization based approaches for preserving the sensitive private information of an individual. The k-anonymity model is one of the widely used anonymization based approach. However, the anonymization based approaches suffer from the ...
متن کامل